Excel files and CSV into Python Pandas#
Loading data into Python from standard Excel files and comma-separated value (CSV) data is fundamental for many data scientists and analysts.
Excel and CSV files are among the most common data storage formats. Python provides various tools to read, manipulate, and analyze this data. In this process, Excel files can be read using libraries like pandas
, xlrd
, and openpyxl
. In contrast, CSV files can be imported using the built-in csv
module or the pandas
library.
Understanding how to load data from these file formats is essential for data analysis and machine learning tasks in Python. This article will explore different methods to read data from Excel and CSV files and analyze some everyday use cases.
How To#
import pandas as pd
df = pd.read_excel("data/housing.xlsx", engine="openpyxl")
df.head(5)
longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households | median_income | median_house_value | ocean_proximity | |
---|---|---|---|---|---|---|---|---|---|---|
0 | -122.23 | 37.88 | 41.0 | 880.0 | 129.0 | 322.0 | 126.0 | 3252-08-01 00:00:00 | 452600.0 | NEAR BAY |
1 | -122.22 | 37.86 | 21.0 | 7099.0 | 1106.0 | 2401.0 | 1138.0 | 3014-08-01 00:00:00 | 358500.0 | NEAR BAY |
2 | -122.24 | 37.85 | 52.0 | 1467.0 | 190.0 | 496.0 | 177.0 | 2574-07-01 00:00:00 | 352100.0 | NEAR BAY |
3 | -122.25 | 37.85 | 52.0 | 1274.0 | 235.0 | 558.0 | 219.0 | 6431-05-01 00:00:00 | 341300.0 | NEAR BAY |
4 | -122.25 | 37.85 | 52.0 | 1627.0 | 280.0 | 565.0 | 259.0 | 8462-03-01 00:00:00 | 342200.0 | NEAR BAY |
pd.read_csv("data/housing.csv").head()
longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households | median_income | median_house_value | ocean_proximity | |
---|---|---|---|---|---|---|---|---|---|---|
0 | -122.23 | 37.88 | 41.0 | 880.0 | 129.0 | 322.0 | 126.0 | 8.3252 | 452600.0 | NEAR BAY |
1 | -122.22 | 37.86 | 21.0 | 7099.0 | 1106.0 | 2401.0 | 1138.0 | 8.3014 | 358500.0 | NEAR BAY |
2 | -122.24 | 37.85 | 52.0 | 1467.0 | 190.0 | 496.0 | 177.0 | 7.2574 | 352100.0 | NEAR BAY |
3 | -122.25 | 37.85 | 52.0 | 1274.0 | 235.0 | 558.0 | 219.0 | 5.6431 | 341300.0 | NEAR BAY |
4 | -122.25 | 37.85 | 52.0 | 1627.0 | 280.0 | 565.0 | 259.0 | 3.8462 | 342200.0 | NEAR BAY |
pd.read_csv("https://raw.githubusercontent.com/freeCodeCamp/open-data/master/medium-fCC-data/data/medium_titles%20-%20rawdata.tsv", sep="\t", )
order | inv-ord | selection | CollTitle | CorrectedTitle | OrigRecommends | classfication | Recommends | class_WD | class_WD&CS | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 541 | 112 | 0.0 | Angular 2 versus React: There Will Be BloodIn ... | Angular 2 versus React: There Will Be Blood | 2.5K | TECH | 2500.0 | 0.0 | 0.0 |
1 | 18 | 635 | 1.0 | The mind-blowing AI announcement from Google t... | The mind-blowing AI announcement from Google t... | 2.4K | TECH | 2400.0 | 0.0 | 0.0 |
2 | 650 | 3 | 0.0 | Things I Wish Someone Had Told Me When I Was L... | Things I Wish Someone Had Told Me When I Was L... | 3.2K | MOT | 3200.0 | 0.0 | 0.0 |
3 | 94 | 559 | 1.0 | How to encrypt your entire life in less than a... | How to encrypt your entire life in less than a... | 7.4K | SEC | 7400.0 | 0.0 | 0.0 |
4 | 409 | 244 | 0.0 | Being A Developer After 40In freeCodeCampView ... | Being A Developer After 40 | 6K | MOT | 6000.0 | 0.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
647 | 636 | 17 | NaN | December 2014 | December 2014 | NaN | date | NaN | NaN | NaN |
648 | 644 | 9 | NaN | November 2014 | November 2014 | NaN | date | NaN | NaN | NaN |
649 | 647 | 6 | NaN | October 2014 | October 2014 | NaN | date | NaN | NaN | NaN |
650 | 649 | 4 | NaN | November 2013 | November 2013 | NaN | date | NaN | NaN | NaN |
651 | 651 | 2 | NaN | October 2013 | October 2013 | NaN | date | NaN | NaN | NaN |
652 rows × 10 columns
Exercise#
Change the data loading to contain the correct data types and explore the keywords for reading CSV files and Excel files using Shift + Tab
.
pd.read_csv("https://raw.githubusercontent.com/freeCodeCamp/open-data/master/medium-fCC-data/data/medium_titles%20-%20rawdata.tsv",
sep="\t", )
order | inv-ord | selection | CollTitle | CorrectedTitle | OrigRecommends | classfication | Recommends | class_WD | class_WD&CS | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 541 | 112 | 0.0 | Angular 2 versus React: There Will Be BloodIn ... | Angular 2 versus React: There Will Be Blood | 2.5K | TECH | 2500.0 | 0.0 | 0.0 |
1 | 18 | 635 | 1.0 | The mind-blowing AI announcement from Google t... | The mind-blowing AI announcement from Google t... | 2.4K | TECH | 2400.0 | 0.0 | 0.0 |
2 | 650 | 3 | 0.0 | Things I Wish Someone Had Told Me When I Was L... | Things I Wish Someone Had Told Me When I Was L... | 3.2K | MOT | 3200.0 | 0.0 | 0.0 |
3 | 94 | 559 | 1.0 | How to encrypt your entire life in less than a... | How to encrypt your entire life in less than a... | 7.4K | SEC | 7400.0 | 0.0 | 0.0 |
4 | 409 | 244 | 0.0 | Being A Developer After 40In freeCodeCampView ... | Being A Developer After 40 | 6K | MOT | 6000.0 | 0.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
647 | 636 | 17 | NaN | December 2014 | December 2014 | NaN | date | NaN | NaN | NaN |
648 | 644 | 9 | NaN | November 2014 | November 2014 | NaN | date | NaN | NaN | NaN |
649 | 647 | 6 | NaN | October 2014 | October 2014 | NaN | date | NaN | NaN | NaN |
650 | 649 | 4 | NaN | November 2013 | November 2013 | NaN | date | NaN | NaN | NaN |
651 | 651 | 2 | NaN | October 2013 | October 2013 | NaN | date | NaN | NaN | NaN |
652 rows × 10 columns